[PROTOCOL RFC] Full Void Type Support by ZiyaZa · Pull Request #7073 · delta-io/delta

ZiyaZa · 2026-06-23T13:05:28Z

Which Delta project/connector is this regarding?

Description

Associated Github issue for discussions: #7072

This PR adds the proposed protocol change for full VOID support everywhere in Delta table schema. Current protocol is being clarified in #6966 to specify how VOID currently needs to be handled, but this RFC further defines a new table feature that will allow tables to persist VOID columns as UNKNOWN type in Parquet, and hence lift the schema limitations we have today.

How was this patch tested?

N/A

Does this PR introduce any user-facing changes?

Creates a new Protocol RFC.

qiyuandong-db · 2026-06-25T12:56:22Z

+## Void columns without the table feature
+
+When the `voidType` feature is not supported, `void` columns can only be **omitted**. Because a `void` column is never written to a data file, writers must reject **writing data** to a table whose schema contains any of the following shapes, in which omitting the `void` column(s) would leave nowhere to record the nullability or length of an enclosing value:
+- a `void` type directly inside an `array` or `map` at any nesting level;


For map, shall we specify that it is allowed only for the value? I recall that we don't allow VOID keys anyway, right?

Also, by 'directly,' we mean arrays like ARRAY<VOID>. But with this feature, are we also going to unblock void indirectly inside an array or map, right? Like ARRAY<STRUCT<INT, VOID>>?

For map, shall we specify that it is allowed only for the value? I recall that we don't allow VOID keys anyway, right?

That is Spark limitation, and I don't think it belongs in Delta Protocol. FWIW The Protocol does not say anything about nullability of map keys in general.

Also, by 'directly,' we mean arrays like ARRAY<VOID>. But with this feature, are we also going to unblock void indirectly inside an array or map, right? Like ARRAY<STRUCT<INT, VOID>>?

Indirect voids are already unblocked, it's just an implementation detail that Spark connector blocks this, but otherwise there's no reason for blocking. This feature does not affect those voids.

qiyuandong-db · 2026-06-25T13:05:31Z

+
+When Void Type is supported (when the `writerFeatures` field of a table's `protocol` action contains `voidType`), writers:
+- must store the table's structural `void` columns as `UNKNOWN` (see [Structural void columns](#structural-void-columns)).
+- may store any non-structural `void` column either by omission or as an `UNKNOWN` column.


Why don't we force writers to always write non-structural void columns by omission?

Because that would make the protocol more difficult to implement and is not strictly necessary. It only makes dropping the feature easier if we have less UNKNOWNs.

But I'll update it to "should" to show it's the preferred behavior.

qiyuandong-db · 2026-06-25T13:23:25Z

+
+A `void` column in any other position is never structural: it can be omitted, and does not require the feature. A schema that contains one of the shapes above is said to **require** the `voidType` feature.
+
+### Writer Requirements for Void Type


Shall we also explain how writers should handle statistics for VOID columns?

I don't think we need any special handling for stats, so the default rules from the protocol should be sufficient.

c27kwan

Nice proposal!

c27kwan · 2026-06-26T12:01:23Z

+# Void Type
+**Associated Github issue for discussions: https://github.com/delta-io/delta/issues/7072**
+
+This protocol change adds support for using the `void` data type (also known as `NullType` in Spark, `UnknownType` in Iceberg, and `UNKNOWN` in Parquet) anywhere in a Delta table schema, via a new reader/writer table feature, `voidType`.


Suggested change

This protocol change adds support for using the `void` data type (also known as `NullType` in Spark, `UnknownType` in Iceberg, and `UNKNOWN` in Parquet) anywhere in a Delta table schema, via a new reader/writer table feature, `voidType`.

The `voidType` reader/writer table feature adds support for using the `void` data type (also known as `NullType` in Spark, `UnknownType` in Iceberg, and `UNKNOWN` in Parquet) anywhere in a Delta table schema.

c27kwan · 2026-06-26T12:04:01Z

+
+`void` is a data type with a single possible value: `NULL`. A column ends up with this type when the writer has no information about its actual type, typically because every value observed so far has been `NULL` (for example, `CREATE TABLE t AS SELECT NULL AS a`, or schema evolution that adds a column containing only `NULL`s).
+
+Today, `void` columns are represented by omitting them from data files and reconstructing them as all-`NULL` columns on read (the missing columns mechanism). That representation cannot encode four schema shapes - a table whose columns are all `void`, a `struct` whose fields are all `void`, a `void` nested in an `array`, and a `void` nested in a `map` - because in each case omitting the `void` column(s) would leave the enclosing `struct`, `array`, or `map` (or the table itself) with nothing written to a data file, and therefore nowhere to record whether the enclosing value is `NULL`, empty, or how long it is. Writers must reject writing data in those cases.


That's true only for some engines like Spark. I think it's officially undefined since it is not supported.

It's becoming official in #6966. I built this RFC assuming that protocol clarification makes it in.

Before that change, the protocol basically said tables can have void, behavior is undefined but it's recommended to drop it upon reads.

c27kwan · 2026-06-26T12:16:30Z

+- a `struct` (at any nesting level) whose fields are all `void`; or
+- a table whose columns are all `void`.
+
+These restrictions are stated in terms of the **table schema**, not the schema of any individual data file. A table with such a schema can still be created, altered through metadata-only operations, and read. It can be made writable by evolving its schema - for example, by changing a `void` column to another type - or by enabling the `voidType` feature.


If the column is omitted, then the schema of the individual data files are all the same, right?

There can be differences due to schema evolution/type widening. Voids will always be missing, but other columns can differ.

c27kwan · 2026-06-26T12:17:53Z

+
+A `void` column may be changed to any other data type through supported schema-evolution operations; this does not require the [Type Widening](/PROTOCOL.md#type-widening) table feature, even when the `void` column is stored as `UNKNOWN`.
+
+## Void columns without the table feature


I'm not sure we can enforce anything on the case without a table feature. A legacy writer/reader does not have knowledge of this new proposal and cannot retroactively ban certain operations.

It seems to me, if someone wants the void type to behave as expected, they must have the table feature and from there it's up to the engine whether they want to omit or materialize the void type column.

The situation is not ideal. void never officially made it into the protocol, but got accidentally introduced to tables by the Spark connector when Spark got NullType support. Then there were various revisions to the Protocol to make sure void is mentioned there, but it was all vague because even the Spark connector did not handle it properly causing query failures. After #6966, the behavior will be defined, and both Spark connector and kernel-rs follows that version of the Protocol.

I understand external clients may now become protocol-incompliant, but if they somehow managed to read what is written by Spark (which I think is the reference implementation) previously, and if they wrote something that Spark could read before, then they should still be protocol-compliant. In any case, this comment is more for #6966 than this PR.

c27kwan · 2026-06-26T12:25:40Z

+### Reader Requirements for Void Type
+
+When Void Type is supported (when the `readerFeatures` field of a table's `protocol` action contains `voidType`), readers:
+- must recognize and tolerate a `void` data type anywhere in a Delta table schema.


This phrasing is a bit weird. "must allow"

c27kwan · 2026-06-26T12:26:57Z

+
+When Void Type is supported (when the `readerFeatures` field of a table's `protocol` action contains `voidType`), readers:
+- must recognize and tolerate a `void` data type anywhere in a Delta table schema.
+- must read a `void` column stored as `UNKNOWN` as an all-`NULL` column.


specify Parquet here, since this seems targeted.

Although maybe a more neutral way to frame this is "must return only null values for columns defined as void in the table schema". Whether it's omitted or materialized, the actual behaviour is that. It's less about the underlying data files' schema than it is about the actual Delta table's schema.

Reworded it to not mention any type and just say return all null independent of representation.

[PROTOCOL RFC] Full Void Type Support

c593bdc

qiyuandong-db reviewed Jun 25, 2026

View reviewed changes

Address comments

a09d6e4

c27kwan reviewed Jun 26, 2026

View reviewed changes

Address comments

a5327ea


		A `void` column in any other position is never structural: it can be omitted, and does not require the feature. A schema that contains one of the shapes above is said to require the `voidType` feature.

		### Writer Requirements for Void Type

	This protocol change adds support for using the `void` data type (also known as `NullType` in Spark, `UnknownType` in Iceberg, and `UNKNOWN` in Parquet) anywhere in a Delta table schema, via a new reader/writer table feature, `voidType`.
	The `voidType` reader/writer table feature adds support for using the `void` data type (also known as `NullType` in Spark, `UnknownType` in Iceberg, and `UNKNOWN` in Parquet) anywhere in a Delta table schema.


		`void` is a data type with a single possible value: `NULL`. A column ends up with this type when the writer has no information about its actual type, typically because every value observed so far has been `NULL` (for example, `CREATE TABLE t AS SELECT NULL AS a`, or schema evolution that adds a column containing only `NULL`s).

		Today, `void` columns are represented by omitting them from data files and reconstructing them as all-`NULL` columns on read (the missing columns mechanism). That representation cannot encode four schema shapes - a table whose columns are all `void`, a `struct` whose fields are all `void`, a `void` nested in an `array`, and a `void` nested in a `map` - because in each case omitting the `void` column(s) would leave the enclosing `struct`, `array`, or `map` (or the table itself) with nothing written to a data file, and therefore nowhere to record whether the enclosing value is `NULL`, empty, or how long it is. Writers must reject writing data in those cases.


		A `void` column may be changed to any other data type through supported schema-evolution operations; this does not require the [Type Widening](/PROTOCOL.md#type-widening) table feature, even when the `void` column is stored as `UNKNOWN`.

		## Void columns without the table feature

Uh oh!

Conversation

ZiyaZa commented Jun 23, 2026

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c27kwan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants